Correcting Input Noise in SMT as a Char-Based Translation Problem
نویسندگان
چکیده
Misspelled words have a direct impact on the final quality obtained by Statistical Machine Translation (SMT) systems as the input becomes noisy and unpredictable. This paper presents some improvement strategies for translating real-life noisy input. The proposed strategies are based on a preprocessing step consisting in a character-based translator (MT) from noisy into cleaned text. The use of a character-level translator allows us to provide various spelling alternatives in a lattice format to the final bilingual translator. Therefore, the final MT is the one that decides the best path to be translated. The different hypotheses are obtained under the assumption of a noisy channel model for this task. This paper shows the experiments done with real-life noisy input and a standard phrase-based SMT system from English into Spanish.
منابع مشابه
A Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملStory Generation from Sequence of Independent Short Descriptions
Existing Natural Language Generation (nlg) systems are weak AI systems and exhibit limited capabilities when language generation tasks demand higher levels of creativity, originality and brevity. Eective solutions or, at least evaluations of modern nlg paradigms for such creative tasks have been elusive, unfortunately. is paper introduces and addresses the task of coherent story generation fr...
متن کاملFive Shades of Noise: Analyzing Machine Translation Errors in User-Generated Text
It is widely accepted that translating usergenerated (UG) text is a difficult task for modern statistical machine translation (SMT) systems. The translation quality metrics typically used in the SMT literature reflect the overall quality of the system output but provide little insight into what exactly makes UG text translation difficult. This paper analyzes in detail the behavior of a state-of...
متن کاملResponse-based Learning for Grounded Machine Translation
We propose a novel learning approach for statistical machine translation (SMT) that allows to extract supervision signals for structured learning from an extrinsic response to a translation input. We show how to generate responses by grounding SMT in the task of executing a semantic parse of a translated query against a database. Experiments on the GEOQUERY database show an improvement of about...
متن کاملThe Circle of Meaning: from Translation to Paraphrasing and Back
Title of dissertation: THE CIRCLE OF MEANING: FROM TRANSLATION TO PARAPHRASING AND BACK Nitin Madnani, Doctor of Philosophy, 2010 Dissertation directed by: Professor Bonnie Dorr Department of Computer Science The preservation of meaning between inputs and outputs is perhaps the most ambitious and, often, the most elusive goal of systems that attempt to process natural language. Nowhere is this ...
متن کامل